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ABSTRACT 

In this paper, we define the general fi^amework to describe 
the diffusion operators associated to a positive matrix. We 
define the equations associated to diffusion operators and 
present some general properties of their state vectors. We 
show how this can be applied to prove and improve the 
convergence of a fixed point problem associated to the ma- 
trix iteration scheme, including for distributed computation 
framework. The approach can be understood as a decompo- 
sition of the matrix-vector product operation in elementary 
operations at the vector entry level. 

Categories and Subject Descriptors 

G.1.3 [Mathematics of Computing]: Numerical Anal- 
ysis — Numerical Linear Algebra; G.1.0 [Mathematics of 
Computing]: Numerical Analysis — Parallel algorithms 

General Terms 

Algorithms, Performance 

Keywords 

Linear algebra, numerical computation, iteration, fixed point, 
eigenvector, distributed computation! 

1. INTRODUCTION 

Today, we are living at the heart of the information age 
and we are facing more and more data for which it becomes 
difficult to process using, say, traditional data processing 
strategies. In such a context, iterative methods to solve 
large sparse linear systems have been gaining interests in 
many different research areas and a large number of solu- 
tions/approaches have been studied. For instance, iterative 
techniques gained a significant efficiency by exploiting the 
sparsity of the information structure (decomposition, parti- 
tion strategies etc). However, if a near to optimal specific 
and direct solution can be built for a given problem, it is ob- 
viously hard to have one solution that remains optimal for a 
large classe of problems. One of the most noticeable generic 
improvement was brought by iterative methods combining 
the Krylov subspace and preconditioning approaches [16| . 

*The work presented in this paper has been partially car- 
ried out at LINCS (www.lincs.fr). 



In this paper, we propose a new iterative algorithm based 
on a simple and intuitive fundamental understanding of the 
linear equations as diffusions: we believe that this approach 
may bring significant improvement in a large classe of linear 
problems. More precisely, we study the fixed point conver- 
gence problem in linear algebra exploiting the idea of fiuid 
diffusion associated to the D-iteration [12]. This approach 
has been initially proposed to solve the PageRank equation 
[2] , [13] . The D-iteration solves X G IR'^ of the equation: 

X = 'PX + B 

where P is a non-negative matrix. This includes in particu- 
lar the case where P is of spectral radius unity (and B = Q). 

Solving a linear problem is a very well known theoretical 
problem and there are a large number of methods that have 
been proposed, studied and explored. For the general de- 
scription of existing and/or alternative iteration methods, 
one may refer to [7], [IS], [H], [8], [IZ]. In particular, for 
the power iteration method (solving PX = X), the theory 
is very well known, for instance when P is associated to an 
irreducible transition matrix of a Markov chain, X would 
be its unique stationary distribution (cf. [6], [4]). The con- 
vergence of classical iterative schemes, such as Jacobi or 
Gauss-Seidel or successive over-relaxation method, is also a 
very well known problem. One usual convergence condition 
is that (condition expressed for the equation AX = B, A 
may be chosen equal to I — P) A is strictly diagonal dom- 
inant. With other approaches such as Krylov, conjugate 
gradient and variant methods exploiting the idea of projec- 
tion and residual minimization, better convergence can be 
obtained under more restrictive conditions on A (such as 
symmetric and positive definite). If the power iteration (or 
the usual matrix-vector product) can be associated to linear 
operations on the rows of the matrix (assuming an itera- 
tion is defined by the product matrix- vector), the diffusion 
approach consists in exploiting the columns of P. The diffu- 
sion approach requires in particular that we have to define 
two state vectors F (fluid) and H (history). In this paper, 
we revisit the convergence condition of the D-iteration and 
we show that its convergence can be obtained under a very 
wide condition on P, with an upper bound on its conver- 
gence rate when the spectral radius of P is strictly less than 
unity. One of the main advantage of the diffusion approach 
is that its distance to the limit is explicitly known and that 
the convergence speed gain can be intuitively and simply 
analyzed. 



Concerning iterative methods on distributed computation 
framework, the usual approach is to define a synchronization 
phase: for instance, in GPU computation [2] or MapRe- 
duce framework , the synchronization phase is an explicit 
step of the architecture. For those approaches, the conver- 
gence condition is generally the same as for the sequential 
computation. For an asynchronous distributed computation 
framework, the convergence may be harder to prove or may 
simply fail: [15] is one of the first paper proving the conver- 
gence of the asynchronous distributed computation of power 
iterations under pretty wide conditions. 

The diffusion approach is well suited to an asynchronous 
distributed computation (cf. [Ill IIP)), but its convergence 
had not been proved formally. In this paper, we formally 
prove the convergence of the diffusion approach (D-iteration) 
both for the sequential computation (single processor) and 
for the distributed computation cases. 

We compare the performance of the distributed computa- 
tions based on the theoretical simulation framework intro- 
duced in [15] and show that for sparse large matrices, the 
case for which the diffusion approach was initially designed, 
the computation gain of the proposed method compared to 
the row-based asynchronous parallel computation is signif- 
icantly large, close to the theoretical optimal efficiency for 
large sparse matrices. 

In Section [21 we define the notations and some general 
properties. Section [3] gives the formal framework of the D- 
iteration method and some general results, in particular, 
the theoretical proofs of the convergence for the sequential 
approach (single processor). In Section [l] we show that for 
the most natural three variants of the D-iteration, a simple 
upper bound of the convergence speed can be obtained when 
P can be reduced in a simple multiplicative form. Section [5] 
gives the proof of the convergence of the distributed scheme. 
Finally, Section [6] reports and discusses some experimental 
results. 

2. GENERAL FRAMEWORK 

We will use the following notations: 

• P G (IR^)'^^^ a non-negative matrix; 

• I G (IR+)^'''^ the identity matrix; 

• Ji the matrix with all entries equal to zero except for 
the i-th diagonal term: (Ji)ii = 1; 

. n = {l,..,iV}; 

• I = {ii, 12, .., in, ...} a sequence of nodes: ik G f2; 

• (j„ : IR^ ^ IR defined by a^{X) = if V has 
no zero entry, we define the norm \X\v = X^ili IfjS^jli 

• Li-norm: if X G IR^^, \X\ = E^i l^A; 

• e the normalized unit column vector: 1/N x (1, .., 1)"^; 

• G = {Go, Gi, G2, .., G„, ..} is a sequence of real vectors 
(in IR^) such that J2^=o < 00; 

• outi is the number of non-zero entries of the i-th col- 
umn of P (counts outgoing links) . 



We assume in this paper that P is non-negative for the 
sake of the simplicity. We could generalize some results be- 
low when p(P) < 1, where P is the matrix where each com- 
ponent pij is replaced by \pij \ . 

2.1 Monotonicity 

We say that P is (Tu -decreasing if: 

WX G (lR+)'^,o-4PX) < a^X). 

We define P" = (1 - a)I + aP. 
Then, we have the following results: 

Property 1. -decreasing property is stable by compo- 
sition of operators (matrix product). 

If P IS Uy -decreasing, for all Q > 0, P" is a^- decreasing. 

If P is a V -decreasing, for all {a, a') G (IR^)^ such that 
a < a', a„(P"'X) < cr„(P"X). 

Proof. The first point is obvious. The other points are 
based on the linearity of a^. 

2.2 Diffusion operators 

We define the A'^ diffusion operators associated to P by: 
P, = 1 J, + P.J. 

Property 2. // P is -decreasing, then the diffusion 
operators Pi are -decreasing. Therefore, for a > 0, 

P? = (PO" =I + a(P-I)Ji 

is ay -decreasing. 

Proof. o-„(P,X) = o-„(X) -f o-„(PJiX) - ay{JiX) and 
we have ay{PJiX) < av{JiX), therefore av{PiX) < cr^(X). 
The last point is the application of Property [1] to Pi. 

3. APPLICATION TO D-ITERATION 

The D-iteration is defined by the couple (P, B) G IR^""^ x 
]R^ and exploits two state vectors: H„ (history) and F„ 
(residual fiuid) based on the following iterative equations: 



and 



Fo ^ B 



Ho = = (0,..0)' 

Hn ~ + Ji„-Fn-1. 



(1) 



(2) 



The D-iteration consists in updating the joint iterations 
on {Fn,Hn). Then, variant strategies may be applied de- 
pending on the choice of the sequence I. To recall the 
dependence on P,B and T, we set: H„{P, B,X) — Hn. 
When the limit is well defined we will set H{P, B,X) = 
limn— ^00 H„{P,B,I). 

We will consider two cases: if p(P) < 1 (p is the spec- 
tral radius of P), then we will see that H„{P , B ,1) has a 
limit (denoted also Hoo) which is the unique solution of the 
equation (cf. Theorem [4}: 

X ^PX -\-B. 

If p(P) = 1, then we will only consider in this paper the 
D-iteration with B — P .e — e (or any other vector for which 
<7v{B) — 0, with V a positive left eigenvector of P). 



3.1 General results 

Theorem 1 (Fundamental diffusion equation). We 
have: 

H„+l + F„ + l ^Hn + Fn + P{Hn + l ~ H„) 

and 

H„ + F„ =PH„+Fo. (3) 

Proof. The first equation is straightforward from the 
equations ([1} and ([2]). The second one can be obtained by 
induction. 

The first equation means that what we have (sum of F 
and H) is what we had before plus what's diffused by the 
increment of H. The second equation means that what you 
have is the initial value plus what you received from diffu- 
sion. 

Theorem 2. We have: 

H„ = (I-- J.„(I-P)) J,„B. 

Proof. We can rewrite the equation as 

H„ - Hr,-l = J.„ (B - (I - P)H„_i) . 

Using ([21, we only need to check that Fn-i is equal to B — 
(I — 'P)Hn~i, which is exactly the equation (|3]|. 

3.2 Adding fluids 

Consider the D-iteration J/„(P, B,X) on which we add G 
(we will denote this by Hn(P , B ,T, G)): before each diffu- 
sion we add to Fn the vector G„ . This means that we modify 
Fn and Hn equations as follow; 

Fn ~ Pi„ {F„-l + Gn-l) 

and 

H„ = Hn-l + Ji„ {F„-l + Gn-l)- 

Then we have the following result: 
Theorem 3. We have: 

Hn + l + F„ + l = Hn + Fn + Gn + P{Hn + l — Hn) 

and 

n-l 

Hn + Fn=PHn+Fo + Y,Gi. (4) 

3.3 Convergence 

We first show the convergence of //„(P, BjX) when p(P) < 
1. This is a quite intuitive result and we only require that 
X is a fair sequence. 

Definition 1. A sequence X is fair if the number of oc- 
currences of each i £ Q is unbounded. 

Remark 1. If we have i £ Q such that {F„)i is equal to 
zero after finite steps no, we don't need the fairness condition 
for the position i (for the convergence). 

Lemma 1. // P is irreducible and p{P) < 1, then there 
exists a (strictly) positive vector V such that \Fn\v is non- 
increasing. As a consequence, Fn is convergent for any given 
sequence X. 



Proof. Set V the left positive eigenvector of P for p(P) 
{V is the left Perron vector, cf. [6], [4]). 

Set j = in+i and / = (-Fn)i„+i, then: 

\Fn-t-l\v =^ |(-F„+l),i;i| -\~ \{Fn + l)jVj\ 

= 5Z K-^")' + /P'jI"' + \fpjj'"j\- 

Let's call A the set of nodes i such that {F„)i has a sign 
opposed to /. Then, 

\F„ + l\y =^(|(-F„),| + \fp^j\)v^ + \fPjj\Vj 

+ + fP^J\ - \{Fn)^\ ~ \fp^A>^ 

= |i^n|. + ^(|(fn), + .fp^J\ ~ |(Fn),| - \fp^A)^^ 
iSA 

^ I Fn I V . 

For the last inequality, we used |a;-|-i/| < |a;|-|-|j/|. Therefore, 
we have l-Fn+il^ < |-Fn|u. Therefore, F„ is convergent. 

Remark 2. The above Lemma holds for any matrix P, if 
there exists a positive vector V such that for all j, I ^ 

n,) < Vj. 

Lemma 2. If B — Bi -\- B2, then for all n, we have 

Hn{P,B,X) ^ Hn{P,B^,X) + Hn{P,B2,X). 

Proof. The proof is straightforward using the linearity 
of H„ w.r.t. F„. 

3.3.1 Casep{P)<l 

Theorem 4. If p{P) < 1, for any fair sequence X, Hn{P, B,X) 
is convergent to the unique vector X such that X — PX-\-B. 

Proof. Let's first assume that B is non-negative, so that 
we only manipulate non-negative quantities. By construc- 
tion, Hn is non-decreasing per entry. From the equation ([3l, 
we have: Hn = (I-P)"! (S-K). Hence, Hn < {I-P)-'B, 
therefore Hn is convergent and because X is a fair sequence, 
necessarily, Fn tends to zero. Then, its limit satisfies the 
claimed equation. Now, if B has negative and positive 
terms, we can decompose B as 5+ — B~ and apply the 
same argument for each component. 

Lemma 3. If p(P) < 1, then for all {a, P) £ IE?, 

aH{P,B,X) + I3H{P,B',X) = H{P,aB + PB' ,X). 
Proof. We have: 
H{P,aB -\- I3B',X) = (I - P)"^(aB -h 

= a(I - Py^B -h /3(I - py^B' 
= aH{P,B,X)+PH{P,B',X). 

Theorems (Superposition). Let S ^^"^^^Gn with 
E,T=o \Gn\ < oc. Ifp{P) < 1, thenH{P,B,X,G) = H{P,B+ 
S,X). 

Proof. We have from the equation Q: Hn{P, B,X, G) = 
(I - P)-\B + Er=",' - Fn). Then, one can easily check 
that the difference | X]i^L„ Gi — Fn\ tends to zero and the 
equality holds. 



Theorem 6 (Monotonicity). Let S = YT'^o 
J:Zo \Gn\ < ^. Let S' = Er=oG; with E^Lo l^^l < oo 
such that Gn < G'r,. Ifp(P) < 1, then for all n, H„{V,B,I,G) < 
H4P,B,I, G'). 

Proof. The proof is straightforward by induction using 
the iterative equations of Fn and Hn {F„ < and Hn < 

3.3.2 Case p(P) = 1 

In this case, we assume that the initial vector B satisfies 
(7v{B) — 0, with V the left eigenvector of P for eigenvalue 
1. 

Let's first consider the case when P is irreducible. We 
denote by Q'^ (resp. ) the subset of f2 such that {Fn)i > 
(resp. < 0). 

Lemma 4. The diffusion operators preserve av , which means 
that: 

cr„(PiX) = cr„(X). 

Proof. o-„(P,X) = cr„(X) + a^{{P - I)JiX). And we 
have cr4(P - I)J,X) = (V^P - y^)J,X = 0. 

Lemma 5. For all n, av{Fn) — 0. 

Proof. This is a direct consequence of Lemma |4] and 
o"i,(-Fo) = by assumption. 

Lemma 6. // at each diffusion step, in € f2j (resp. in £ 
0,n), then Q,n (resp. converges to f2+ (resp. QT ). 

Proof. It is clear that the subset Q,n is non-decreasing 
(we only add positive quantities to each node). It is bounded 
by Q,, hence convergent. 

We denote by 1+ (resp. I") a fair sequence on Q,'^ = 
yj'^-]Q,n (resp. = UjJ^if2^), then, we have the following 
result. 

Theorem 7. //P is irreducible and I = I+, then Hn(P, P.e 
e,X) is convergent to X — e where X is the real right eigen- 
vector ofP for the eigenvalue 1 with miui Xi = 1/N . 

Proof. If there exists T < oo such that U^^if2+ = il, 
then cr^(_Fjt) = (j^(_Fj7) = and Hn converged in finite 
time. Otherwise, is strictly included in Q,. Let P+ be 
the restriction of P on f^+i then /9(P"'') < 1 (cf. [4]). 

At the limit, H satisfies {H + e) = ^{H + e) with H + ea. 
positive vector. There is at least one coordinate i on which 
the diffusion operator has been never applied (with positive 
fluid), therefore xmni{H)i — 0. 

Remark 3. // P is not irreducible, the diffusion on X+ 
may not converge. The counter example is easy to be found. 

In order to prove the convergence for the sequence I~ 
without assuming the irreducibility of P, we consider a bit 
more general diffusion iterations as follows: 

f: = pr„"i^"-i (5) 

and 

H" = Hn-l + anJi„Fn-i (6) 

where On > 0. If for all n, a„ = 1, we have the usual 
diffusion iteration. 



Theorem 8. {Fn,Hn) satisfies: 

m + f: = PH^ + B. (7) 

Proof. The proof is the same as for the case a = 1, by 
induction and using equations ([5]) and ((6)1. 

Theorem 9. Assume we choose Fo > and Ho = 0. 
Then, Fn and Hn are positive and {H^)i is an increasing 
function for all i. 

If P is -decreasing, then av{F^) is a decreasing func- 
tion. 

Proof. The proof is straightforward. 

Theorem 10 (Partial diffusion). If we build the two 
diffusion iterations {Fn,H^) and {Fn ,Hn ) from the same 
initial vector Fo (Ho ~ 0) and for the same diffusion se- 
quence I, if for all n, < an < < 1, then we have: 

• 'T^Fn"') <a„(F„"); 

• Hn > H" (for each vector entry); 

. HZ +F;:' >h: + f::. 

Proof. The first inequality is a direct consequence of 
Property [5] and Property [T] For the second and third in- 
equalities, we prove by induction: we have obviously Hq > 
Ho and Hq' -f Fq"' > Hq -^-Fq. Assume we have, Hn > H^ 
and H^' - H^ > F^ ~ F^' . Then, from ®: 

Hn+l ~ Hn -\- Qn4-lJj„ + i-F„ 
1 Fn 

and 

Hn + l ~ Hn + an + lJ i„^iFn 

We first prove that: 

Hn — Hn > a„ + lJi^^j(F^ — Fn )■ 

For i / in+i, {H:' - HZ), > and (J.„+i(F„" - K'))i = 
0. For i = in+i, we only need to handle the case {F" — 
Fn')i„+i > and we use the relation: H^' - H^ > F^ - 
Fn . Hence, he have the inequality H^+i > H^+i- Then, 
Hn+l — HZ+i > Fn+i — Fn+i is Straightforward using the 
equation ([7]) of Theorem [8l 

Remark 4. The power iteration Xn = PXn-i can be 
described m the above scheme {Fn,HZ) with Xn = F^n 
taking Fo = Xo and if we apply the cyclic sequence 1,..,N 
(in = n mod N) where ctkN+i < 1 is chosen such that we 
diffuse exactly (P'^Xq); (such a value exists and is less than 
1 because after the diffusion of nodes l,..,i the residual fluid 
on (i -f l)-th node can only be increased). 

Remark 5. Note that the above theorem is valid without 
any condition on p{P). 

Theorem 11. Assume we have a strictly positive left- 
eigenvector V > Q of P . If we choose any fair sequence 
of nodes T = (we only diffuse negative fluids), then the 
diffusion applied on (P,Pe — e) converges to a unique H^o 
such that Hoo -\- e is the right-eigenvector for eigenvalue 1 
for P , such that the maximum value within each strongly 
connected component of spectral radius one is equal to 1/N. 



Proof. The diffusion from P.e — e can be decomposed 
as the difference of two diffusion process {F^ , ) and 
{F",H") as follows: we start with Fq = e. For the A'' first 
diffusions, we choose in = n and 

• for Pf;- , Q„ = 0; 

• for P"", a'n such that we diffuse exactly 1/N from all 
nodes (such a value exists and is less than 1 because 
after the diffusion of nodes 1, .., i the residual fluid on 
(i + l)-th node can only be increased). 

Then we have: F^ = e, H"^ = Q and F^' = P.e, H"^ = e. 
Then from the (A'^ + l)-th diffusion, we apply exactly the 
same sequence with an = a'^ = 1- For n > N, from Theo- 
rem[TOl(and linearity of H, Lemma[2] w.r.t. F„ = F^ —F"), 
we have Hn + e = Hn — Hn > and we have o^^Fn) = 
<^v{Fn — Fn) = 0. If we only diffuse negative fluids, this 
means that Hn is a decreasing function (per entry). Since 
we have < Hn + e < e, Hn is convergent. We proved 
above (Lemma[T} that \Fn\v = X^i I''' ^ (^n)il is ^ decreas- 
ing function. The convergence of Hn implies of course the 
convergence to zero of Fn on the coordinate iii > and we 
have H + e — P(i/ + e). Now we still have to prove that 
H + e ^ 0. In fact, there exists at least one entry such that 
{H)i = 0: take i G (here we require that f2j is only 
for entries strictly positive). If the iteration stops in finite 
time, the node in the last non empty satisfies {H)i = 0. 
Otherwise, there exists at least one node such that it stays 
always strictly positive, for which we have {H)i = 0. Hence, 
maxi(J/ + e)i = 1/N. The argument here applies for each 
strongly connected component (SCC) of spectral radius one 
(SCC(l)). For the uniqueness, we can decompose P in three 
parts (cf. [4]): 

• those who point to SCC(l)s and are not pointed to by 
any SCC(l)s: for them we have {H)i — 0; 

• those in SCC(l)s: for each SCC(l), H restricted on a 
SCC(l) is equal to the unique eigenvector of SCC(l) 
(Perron eigenvector) such that the maximum is equal 
to 1/N; 

• those who are pointed to by SCC(l)s and are not 
pointing to any SCC(l)s: this case corresponds to the 
previous section (diffusion on p < 1) for which the ini- 
tial condition results from Pe— e and the fluids received 
from SCC(l)s, on which we have uniqueness. 

In order to justify clearly the above decomposition, we 
argue as follows: because of the existence of a strictly posi- 
tive left-eigenvector of P (for 1), we can decompose Q as in 
Figure [T] 

where the restriction of P on the subsets IN and OUT has 
a spectral radius strictly less then 1. 

The diffusion of nodes in IN is independent of the rest 
and its diffusion limit is {H -\- e)i^iN = as shown in the 
following Lemma: 

Lemma 7. // p(P) < 1, and if we apply only negative 
diffusion from the initial condition Fo = P.e — e, then the 
limit IS H — —e. 

Proof. Since p(P) < 1, the convergence of H is clear. 
We need to justify why diffusing only negative fluid, we do 




Figure 1: Decomposition. 



not end up with some strictly positive coordinates. From 
Theorem 1101 we have Hn -I- e > 0. Then from the equation 
Q, one can easily get: cr„(F„) = {p{F) — l)av{Hn+e), which 
means that Fn has always some negative components. At 
the limit, we can not have any negative fluids and a-v{Foo) = 
(p(P) — l)ay{Hao+e), which implies H^ = — e and F^^ = 0. 

From the decomposition of Figure [1] once we have that 
{H)i + 1/N = on the nodes in IN, the diffusion of each 
SCC(l) is exactly the independent diffusion of P restricted 
to each SCC(l) component, which converges to the unique 
eigenvector for 1. Finally, the nodes of OUT converges to 
a unique solution, since again we have p{OUT) < 1 and we 
can apply Theorem [5] This ends the proof of the theorem. 

Remark 6. // V has some coordinates equal to zero, the 
above convergence works only on i such that Vi > and we 
can have {Foo)i > on i such that Vi — 0. Let be 
the set coordinates such that {Foo)i > 0. Then the spectral 
radius of P restricted to £7+°° may be one, m which case, 
there is no convergence. 

Remark 7. // we mix the diffusion of positive and nega- 
tive fluids, there is no guarantee that the D-iteration algo- 
rithm converges. For instance, with a snake- configuration 
counter example, we may oscillate ( take for instance, P as- 
sociated to a five nodes graph: 1 — >■ 2 and 1 — !■ 3; 2 — > 4; 
3 ^ 5; 4^ 1; 5 ^ 1). 

Theorem 12 (Monotonicity 2). Let S = Y.n=o'^^ 

oo. Ifforalln, Er=o < Er=o then for alln, Hn{F,B,I,G) < 
Hn{P,B,I, G'). 

Proof. This is a particular case of the partial diffusion 
resuh (Theorem fTO)) : //„(P, B,l, G) can be seen as Hn(P, B+ 
G',I) where at step n, Ei"ln+i G'i-\-Y^"^Q{G'i-Gi) has been 
blocked. Hn(P,B,I,G') can be seen as H„{P,B -\- G',I) 
where at step n, Ei^n+i been blocked. 

4. CONVERGENCE SPEED 

For the sake of the simplicity, we only considered here the 
case of sub-stochastic matrices of the form: P = d.Ps where 
< d < 1 and Ps is a transition matrix. 

We consider the following choice of sequence: 



CYC: 



n mod A''; 



MAX: in = argmaxi(F„)i; 
COST: in = argmaxi {{Fn)i/outi). 



Let's first consider tlie tlieoretical cost of tlie iteration as 
tiie number of times diffusions are applied. 

Theorem 13. The convergence speed of CYC and MAX 
is at least d^'^'^^ , where I is the number time diffusions are 
applied. 

Proof. CYC: from F„, after N diffusions, we diffused at 
least \Fn\, therefore, |F„+iv| < d.|F„|. 

MAX: if we order {Fn)i: (Fn)i > {Fn)2 > ...{F„)n, we start 
by diffusing {Fn)i, and at the fc-th diffusion we take a fluid 
at least equal to the fc-th term {Fn)k- As for CYC, we dif- 
fused at least \F„\ after N diffusions. 

Now let's consider the theoretical cost of the iteration as 
the number of times a non-zero element of P is used. 

Theorem 14. The convergence speed of COST is at least 
d'-''^^^, where L = 'Y^^outi and I is the number of times a 
non-zero element of P is used. 

Proof. We just need to count as above the amount of 
fluid we move while using L links of P. At step n, as- 
sume we order {Fn)i/outi : {Fn)i/outi > {F„)2/out2 > 
■• > {Fn)N / outN . At n-l- 1, we diffuse exactly which 
costs out\. Assume we have diffused Dk-i with L^-i op- 
erations. Then, At step n -\- k, we diffuse with cost 
such that dk/lk > {F„)k/outk. Hence, we have Dk/Lk > 
iJ2'^=iiFri)%)/{Y^'^^iOutt) > \F„\/L. If we stop counting 
when Lk just exceeds L (say fco), then, Dkg > \Fn\/LxLko > 

Remark 8. An interesting heuristic bound is d/{2 — d) 
assuming both the graph and the sequence are random. 

5. CONVERGENCE OF DISTRIBUTED AL- 
GORITHM 

5.1 Case p < i 

Here we first consider the case p(P) < 1. We detail the 
proof of the convergence for K = 2 when the computation 
is done on two virtual processors, that we call PIDl and 
P1D2. We assume a fixed partition of il = Qi LI Q2 and the 
corresponding decomposition of P as: 

\P21 P22J 

The diffusion state variables of two PIDs are denoted by: 
(F^,H^), {F^,H^). For a single PID sequential system, we 
note H = {[H]\ [Hf). 

To prove the convergence of the distributed architecture 
to the proper eigenvector, we consider the following decom- 
position: we assume that PIDi computes the D-iteration 
iy(Pii, _Bi,X^) with additional inputs from PID2 at times 
Ti, ..r„, .... During t„ ^ T„ - r„_i (we set To = 0), PIDi 
applies the diffusion sequence I^. We assume without any 
loss of generality that PID2 computes the D-iteration based 
on the sequence (I^ during T^ — T^_i) and sends at T'„ 
(n > 1) the quantity 

F^2{H^^-H^^_f^'AHl 



Let's first assume that this quantity defines the additional 
input at Tn by PIDi (one way). The information transmis- 
sion delay is then defined by T„ —T^. We will see that we 
don't need that the information from PID2 arrives in the 
order for the proper convergence. The only requirement is 
that Tn - Tn and Tn - T^-i are bounded. 
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Figure 2: Data exchange: from PID2 to PIDl. 

If n is such that Tk < n < Tk+i, we have (upper bound): 

Hn =Hti (Pll , Bi , Ii ) 

+ h^,(Pii,f:^^+ah!,x^) 

+ H^,{Pii,F^^_^+AHLiX) 
+ i/„_T, (Pii , Ft, + AHl , Xl+i) 
=(I-Pii)~'(Bi + Pi2//|., -P^). 

Hence, 

Hl<(l-Pii)-\Bi+Pi2Hl). (8) 
In the above, we used the equality 

Hr,{Pii,FT,_,+AHti,X) = {I-Pii)-\FT^_^+AHti-FT,). 

The above result shows that only the cumulated fluids mat- 
ter for PIDl. 

Let's now use the notation of the G (fluid addition): if n = 
Ti, we set = AHf, otherwise Gn = 0. We assume S = 
ESiC? = P12HI < 00. Then Hi = i/n(Pii, G') 
(we define G^ in the same way). From Theorem O we have 
Hi,^H{Pri,Bj+G,X'). 

Now consider the global system. Let's denote by {Tn,Tn) 
the sending time and the receiving time of the n-th fiuid 
transmission exchange between PIDs (in both directions). 
Because the different PIDs may have different computation 
speed, we may have that at T„ the number of iterations done 
by the PID that receives the fluid is less than the number 
of iterations done by the sender at time Tn- For the sake of 
the simplicity, we still denote by Ft„ (same for H and for 
T^) the value of the F at time Tn. 

Lemma 8 (Impact of the delay). Consider two dis- 
tributed computation systems 5(1), 5(2) that have exactly 
the same Tn and that differs only by Tn such that for all 
n, Tn(l) < Tn{2). Then, we have: Hi{l) > Hl{2) and 
Hl{l)>Hl{2). 

Proof. This is a direct consequence of Theorem 1121 

Lemma 9. At each step, he have Hn < [H\\ and H^ < 
[H]n. Therefore, the distributed computation scheme con- 
verges. 



Proof. From Lemma [S] we first have the monotonicity 
when reducing the delay up to zero delay T„ = T^. Then, 
again thanks to Theorem 1 121 one can show that adding more 
fluid exchanges, we can only increase the history vector. 
The system with zero delay and exchange at each step cor- 
responds to the sequential system H . Therefore, we have 
Hi < imi < and < [H]l < [i/]^, and and 

are convergent. 

Theorem 15. // each PID applies a fair sequence for dif- 
fusions and if for all n, \T^-Ti\ <T and \Tl, - Ti_i\ < T, 
then the distributed computation converges to the sequential 
single PID computation H . 

Proof. From the equation we still have: < (I — 
Pii)-'(Bi + Pi2//^) andH^ < (I - P22)-'(B2 + P2i//^). 

If for all n, \T„ - T^] <T and \Ti - T^^j] < T we have 
(lower bound): 

H'„+2T > (I - Pll)"'(Bl + Pl2H^n - F^+2t) 

and 

^^n+2T > (I - P22)-'(B2 + P2lHi - F^+,r) 

We already know that both converges (non-decreasing and 
upper bounded by [H]l^ and [H]l^), therefore -F^+2t and 
Fn+2T goes to zero and at the limit we have: 

Hl, = {I-Pii)-\B^+Pi2Hl) 

and 

Hi = (I-P22)"'(B2 + P2li/i,). 

Therefore, we have — [H]l and = [H]^ (because 
H is the unique solution satisfying the above two equations) . 

Now, the generalization of this proof to any K > 1 follows 
exactly the same ideas. 

5.2 Case p = i 

Here, we assume that there exists V a strictly positive 
left eigenvector of P and that the initial vector B satisfies 
f7v{B) = 0. Let's consider the limit of the diffusion on n~. 

We use the same notation than in the previous section. 
Since we only apply the diffusion on nodes having negative 
fluids, the history vector H is for each entry a negative de- 
creasing function. 

In the previous case with p < 1, we could assume that 
and are predefined and independent of the fluid ex- 
changes. The flrst difficulty here is that a priori we can no 
more do such an assumption, since the sign of {F)i depends 
on the past fluid exchanges. 

To overcome this difficulty and apply the previous results, 
we do the following trick which consists in the decomposi- 
tion of the diffusion in two steps: selection of a node and 
diffusion test. We assume given two fair sequences X^ and 
X^ on f2i and Q2 (fair means here that every coordinate will 
be candidate for diffusion an infinite number of times) and 
when a node is selected, we only diffuse if its sign is negative. 

Then, we have the following result: 

Theorem 16 (Monotonicity extended). Let Gn and 
G'n two sequences of non-positive vectors, such that for all 
'^i X^r^i < G'i- Let X be a fair sequence. If we only 

diffuse the negative fluids, then for all n. Hn{P , B,X,G) < 
Hn{P,B,X,G'). 



Proof. The argument is exactly the one used in Theo- 
rem [12] 

Then, the results of Lemma[8]and Lemma[9]still hold with 
inversed inequality. 

Theorem 17. // each PID applies a fair sequence for dif- 
fusions and if for all n, \T^-T'^\ <T and \Ti - T',;_i | < T, 
then the distributed computation on Sl^ converges to the se- 
quential single PID computation H obtained by the diffusion 
of negative fluid. 

Proof. We have the lower bounds: [II]l < [HW < 
and [Hf^ < [H]l < H^, therefore and are conver- 
gent. 

We know that the limit of the distributed computation 
satisfies: 

{I-Pii)Hl=Bi+Pr2Hl (9) 
(I - P22)Hl =B2 + P21HI. (10) 

Let Z = {{HLf,{Hlf f - H. If p(Pn) < 1 and 
p(P22) < 1, then we have the uniqueness of the solution and 
Z = 0. If p{Pii) = 1 and p(P22) = 1, then since p(P) = 1 
and since we assumed the existence of strictly positive left- 
eigenvector for 1, we can not have any positive terms linking 
fii and fl2 (cf. [4]), hence we have two independent systems 
for which the uniqueness was proved (Theorem [TT]). Finally, 
if p(Pii) — 1 and p(P22) < 1, then we must have P12 = 0, 
and for the same reason, we still have Z = 0. 

6. EXPERIMENTATION 

6.1 Uniform graph with iv = 128 

We first report the experiment results by reproducing the 
hypothetical parallel computer simulations as defined in |15] 
(Table 1). We reproduced a random (symmetrical) graph of 
128 nodes containing 1652 links, 23 of which are self-loops. 
As in [15], we assumed that a processor spends cycles to 
read and write its local memory, Tr and Tu, cycles, respec- 
tively, to read from and write into the shared memory, Tm 
cycles for one multiplication and Ta cycles for one addition 
(below, we took Tr = 4, T„ = 2, = = 1). The degrees 
of the nodes vary from 14 to 38, with a mean of 25.6 and 
standard deviation of 4.9. In Table [TJ we show the results 
of: 

• sPI-R: synchronized power iteration per row (i.e. Ja- 
cobi iteration); 

• aPI-R: asynchronized power iteration per row (method 
evaluated in [15]); 

• sPI-C: synchronized power iteration per column; 

• sPI-Cr: synchronized power iteration per column as- 
suming identical weight pij for each column j (as in 
PageRank equation); 

• DI+COST: asynchronized diffusion applying COST from 
the initial vector P.e — e (cost of the computation of 
P.e — e is included). 

We see that the results of sPI-R and aPI-R are very close 
to those observed in Table 1 of [15] . With sPI-C, we see a 



K 


sPI-R 


aPI-R 


sPI-C 


sPI-Cr 


DI+COST 


1 


65620 


45934 


65620 


34090 


18183 


2 


66020 


45472 


37500 


21470 


20047 


4 


41620 


28504 


23180 


15030 


17347 


8 


24000 


16590 


16000 


11770 


13711 


16 


13060 


9142 


11940 


9780 


9952 


32 


7880 


5056 


9980 


8720 


6469 


64 


4120 


3056 


8210 


7550 


4264 


128 


2260 


2256 


7215 


6845 


3128 



Table 1: Uniform graph iV = 128. 



gain for K = 2,4,8: that's because the fluid exchange to 
other PIDs aggregate the results on the diffusions of N/K 
nodes to N/K * {K — 1) nodes. However, there is a higher 
penalty when K becomes close to A'^. This can be explained 
considering the limit case of K = A'': whereas one coordi- 
nate updates on a row having n,- non-zero values requires 
about 2 X rir additions and multiplications followed by one 
writing cycle, the diffusion would require 2 x tIc additions 
and multiplications (for a column having Uc non-zero val- 
ues) followed by ric updates (read, addition and write) of 
shared memory. The result for sPI-Cr shows the gain when 
exploiting the homogeneity of the column weight (one mul- 
tiplication required per diffusion), which can be done only 
when column based operations are used (as for the diffusion 
approach). 
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Figure 3: Normalized speeds. 

Figure [3] shows the results of Table [T] in terms of the con- 
vergence speeds normalized by the value of sPI-R for K — 1. 
Ideal-PI and Ideal-DI are the ideal curve {y = x) starting 
from if = 1 of PI and DI. 

6.2 Web graph 

In this section, we used the web graph imported from the 
dataset uk-2007-05 @1000000 (available on [T]) which has 
41,247,159 links on 10*' nodes. The aim is to compare the 
theoretical computation cost for a large sparse matrix such 
as the one associated to a web graph, the case for which the 
diffusion approach was initially designed. The A'^ — 1000 
case is obtained considering the first 1000 nodes of the above 
graph. Here the convergence is on the PageRank equation 
(eigenvector with damping factor of 0.85). The results are 
shown on Figure H for N = 1000 and Figure [5] for iV = 10*^. 
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Figure 4: Normalized speeds: = 1000. 



In Figure 131 the benefit of the column based diffusion ap- 
proach is much more significant and even more significant 
for A'^ = 10^ (Figure O. In Figure O we added the perfor- 
mance of a dynamical partition approach (cf. [10] for more 
details: the dynamical partition used here consists roughly 
in observing the convergence speed in logscale of each PID 
based on its remaining fiuid quantity and transferring 10% 
of nodes that is managed by the slowest PID to the fastest 
PID when the difference is higher than 50%) in order to 
check/validate the property claimed in [10]: we can see that 
applying the cost assumption/model of lU a simple dynam- 
ical partition strategy on the diffusion method leads to a 
performance that is close to the optimal efficiency (close 
to the ideal curve). Note that the cost of the partition up- 
dates/modifications is partially included here (unity cost per 
node exchanged on both sides of PIDs). 



sPI-R - 
aPI-R 
spl-Cr 
DI+COST 
Ideal from Pi' 
Ideal from Dl 
DI+CrOST DYN 



Figure 5: Normalized speeds: A" — 10 
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7. CONCLUSION 

In this paper, we addressed the formal proof of conver- 
gence of different D-iteration schemes, including the case 



of the distributed computation and the upper bounds of 
the convergence speeds. We used the theoretical formal ap- 
proach of [TS] to evaluated the theoretical computation cost 
and showed the significant gain of our diffusion approach for 
the computation of the eigenvector of large sparse matrices. 
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